🎮 SIMT Execution - hello

Discussed on Hacker News

🎨Chroma Towards Data Science·

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU

⚡Hardware Acceleration i-programmer.info·

Lemonade SDK Adds Nvidia CUDA Support

Covers Show HN: Lemonade: Run LLMs Locally with GPU and NPU Acceleration

⚡Hardware Acceleration Carscoops·

Got A Bugatti W16 Lying Around? A Designer Has The Perfect ‘Cuda For It

Less-relevant results

⚡Hardware Acceleration developer.nvidia.com·

Boosting MoE Training Throughput with Advanced Fusion Kernels

🔧LLVM IR Optimization hiraditya.github.io·

Loop Unrolling in the ML Era

Discussed on Hacker News

⚡Hardware Acceleration GitHub·

I got tired of not understanding how vLLM works under the hood, so I built my own mini inference engine from scratch.

Discussed on r/LLM

💬Prompt Engineering DEV Community·

llama-bench skipped FA on capable GPUs — b9437 corrects it

Covers 2 stories including GitHub here . You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inferen...

Discussed on DEV

🔬Deep Learning Flox·

Training nanoGPT on Slurm with a Nix-Pinned Environment

Covers The new vscode.dev search function is absolute shit.

Discussed on Hacker News

🔬Deep Learning GitHub·

GPU Puzzles (2021)

Discussed on Hacker News

🔬Deep Learning GitHub·

open-source Jarvis project

Discussed on r/LLM

⚡Hardware Acceleration GitHub·

Running a 35B MoE model on a 2017 AMD RX 580 8GB via Vulkan (no ROCm/CUDA)

Discussed on Hacker News

🌟Ray Tracing arxiv.org·

Realizing Native INT8 Compute for Diffusion Transformers on Consumer GPUs: A Fused INT8 GEMM Kernel for Ideogram 4.0

🔬Deep Learning GitHub

pytorch/executorch ciflow/cuda/20384

⚡Hardware Acceleration DEV Community·

Registers, Lanes, and Berry Phase: Lifting Siunertaq from Batch Graphs to the Complex Plane

Discussed on DEV

⚡Hardware Acceleration GitHub·

Show HN: cuTile Rust: Safe, data-race-free GPU kernels in Rust

Covers 2 stories including AlterLang InterCode: A Native Intercomprehension Paradigm in Programming, Powered by GuruDev

Covered by indiehacker.news

Discussed on Hacker News and DEV

💬Prompt Engineering GitHub·

EAGLE support merged into llama.cpp

Discussed on r/LocalLLaMA

🔬Deep Learning GitHub

pytorch/executorch ciflow/cuda/20288

No more posts from hello's subscribed feeds.

Scour all 25,324 feeds Learn more about Feeds

From Tokens to Regions: CUDA-Sensitive Instruction Tuning for GPU Kernel Generation

The Most Important Nvidia Product Isn't a Chip. It's This.

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU

Lemonade SDK Adds Nvidia CUDA Support

Got A Bugatti W16 Lying Around? A Designer Has The Perfect ‘Cuda For It

Boosting MoE Training Throughput with Advanced Fusion Kernels

Loop Unrolling in the ML Era

I got tired of not understanding how vLLM works under the hood, so I built my own mini inference engine from scratch.

llama-bench skipped FA on capable GPUs — b9437 corrects it

Training nanoGPT on Slurm with a Nix-Pinned Environment

GPU Puzzles (2021)

open-source Jarvis project

Running a 35B MoE model on a 2017 AMD RX 580 8GB via Vulkan (no ROCm/CUDA)

Realizing Native INT8 Compute for Diffusion Transformers on Consumer GPUs: A Fused INT8 GEMM Kernel for Ideogram 4.0

pytorch/executorch ciflow/cuda/20384

Registers, Lanes, and Berry Phase: Lifting Siunertaq from Batch Graphs to the Complex Plane

Show HN: cuTile Rust: Safe, data-race-free GPU kernels in Rust

EAGLE support merged into llama.cpp

pytorch/executorch ciflow/cuda/20288